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Abstract 

A three-body potential function can account for interactions among triples of particles which 
are uncaptured by pairwise interaction functions such as Coulombic or Lennard- Jones poten- 
tials. Likewise, a multibody potential of order n can account for interactions among n-tuples 
of particles uncaptured by interaction functions of lower orders. To date, the computation of 
multibody potential functions for a large number of particles has not been possible due to its 
0(N n ) scaling cost. In this paper we describe a fast tree-code for efficiently approximating 
multibody potentials that can be factorized as products of functions of pairwise distances. 
For the first time, we show how to derive a Barnes-Hut type algorithm for handling inter- 
actions among more than two particles. Our algorithm uses two approximation schemes: 
1) a deterministic series expansion-based method; 2) a Monte Carlo-based approximation 
based on the central limit theorem. Our approach guarantees a user-specified bound on the 
absolute or relative error in the computed potential with an asymptotic probability guar- 
antee. We provide speedup results on a three-body dispersion potential, the Axilrod- Teller 
potential. 

Keywords: Fast multipole methods; Data structures; kd-trees; Axilrod- Teller potential; 
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Figure 1: An example multibody computation (n — 3). For each fixed argument Xi x , $(2;^) equals the 
summation of the entries (^(x^ , Xi 2 , Xi 3 ) in the shaded region corresponding to . 

1. Introduction 

In this paper, we generalize previous algorithmic frameworks for rapidly computing pair- 
wise summations to include higher-order summations. Suppose we are given a set of particles 
X = {x , ■ ■ ■ ,a;7v-i} in -D- dimensional space. 

For x E X and a n-tuple function </> : M D x • • ■ x ~R D — >■ R, we are interested in computing 



Jn copies 
i 



the following for: 

$(x; X x ■ ■ ■ x X) = "' Yl <l>(x,Xi2--- , x iJ (!) 

(n-i) copies x i2 £X\{x} x i3 ex\{x} x in ex\{x} 

V i-2<l3 in 1 <in 

Sums of the form Equation occur in molecular dynamics, protein structure prediction, 
and other similar contexts. Biomolecular simulations usually break down the interactions 
in complex chemical systems into balls-and-springs mechanical models augmented by tor- 
sional terms, pairwise point charge electrostatic terms, and simple pairwise dispersion (van 



1 In computing $(x), we fix one of the arguments of <j> as x and choose a (n — l)-subset from X^ n 1 - ) 
which does not contain x. 
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der Waals) interactions, etc. However, such pairwise (n = 2) interactions often fail to cap- 
ture important, complex non- additive interactions found in real systems. Though many 
researchers have argued that multibody potentials enable more accurate and realistic molec- 
ular modeling, the evaluation of n-body forces for n > 3 in systems beyond tiny sizes (less 
than 10,000 particles) has not been possible due to the unavailability of an efficient way to 
realize the computation. 

In this paper we focus on computing multibody potentials of the third order (n = 3), but 
frame our presentation so that the methods can easily be generalized to handle higher-order 
potentials. For concreteness, we consider the Axilrod- Teller potential (dispersion potential): 

, , N 1 + 3 cos 0i cos 9j cos 9 k 

<p{x it x jt x k ) = T , rig,, jTsTj fig (2) 

> ■ . /y .11" i ■ . rp j ^ ny . nf . ^ 

where 9i, 9j, 9% are the angles at the vertices of the triangle XiXjXk and || ■ || is the Euclidean 
distance metric. This potential jl| describes induced dipole interactions between triples of 
atoms, and is known to be important for the accurate computation of the physical properties 
of certain noble gases. 

This Paper. For the first time, we introduce a fast algorithm for efficiently computing 
multibody potentials for a large number of particles. We restrict the class of multibody 
potentials to those that can be factorized as products of functions of pairwise Euclidean 
distances. That is, 

Our algorithm achieves speedup by utilizing two approximation methods: a deterministic 
and a probabilistic one. The deterministic approximation is based on the analytic series- 
expansion-based approach in to handle potential functions that describe n-body 
interactions with n > 2. The probabilistic approach uses a Monte Carlo-based approximation 
based on the central limit theorem. Our algorithm can compute multibody potentials within 
user-specified bounds for relative or absolute error with an asymptotic probability guarantee. 

However, we would like to point out the following limitations in our algorithm. First of all, 
we do not present a full-fledged derivation of all three translation operators (namely the far- 
to-far, the far-to-local, and the local-to-local translation operators) for the general multibody 
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case. While we define the far-field expansion for a restricted class of multibody potentials, 
defining the local expansion for this same class is harder (see Section l3~4"|) . We would also 
like to point out that the hybrid deterministic/probabilistic approximation heuristic works 
under some partial distributions but not all. Indeed, there are configurations for which 
the speedup factor over the naive brute-force method is minimal. The Monte-Carlo based 
approximation relies on two theorems: 1) the central limit theorem from which we determine 
the number of required samples; 2) the Berry-Esseen theorem which characterizes the the 
rate at which the sample average converges to the true average. Both theorems provide only 
asymptotic guarantees. 

Our work utilizes and extends a framework for efficient algorithms for so-called gen- 
eralized N-Body Problems [5], which introduced multi-tree methods. The framework was 
originally developed to accelerate common bottleneck statistical computations based on 
distances; it utilizes multiple kd-trees and other spatial data structures to reduce compu- 
tation times both asymptotically and practically by multiple orders of magnitude. This 
work extends the framework with higher-order hierarchical series approximation techniques, 
demonstrating a fast multipole-type method for higher-order interactions for the first time, 
effectively creating a Multibody Multipole Method. 

Section [3] introduces the generalized N-body framework and describes a partial exten- 
sion of fast multipole-type methods to handle higher-order interactions; we will discuss the 
technical difficulties for deriving all of the necessary tools for the general multibody case. 
As a result, we utilize only a simple but effective approximation using the center-of-mass 
approximations. Section H] focuses on three-body interactions and introduces methods to 
do potential computations under both deterministic and probabilistic error criteria; the sec- 
tion also provides a description of the fast algorithm for the three-body case. Section [5] 
proves that our proposed algorithms can approximate potentials within user-specified error 
bounds. Section [6] shows experimental scalability results for our proposed algorithms against 
the naive algorithm under different error parameter settings. 
Notations. Throughout this paper, we use these common sets of notations: 
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• (Normal Distribution). This is denoted by A/"(/x, E) where /i and E are the mean 
and the covariance respectively. 

• (Vector Component). For a given vector v G M fc , we access its d-th component by 
v[d] where 1 < d < k (i.e. 1-based index). 

• (Multi-index Notation). Throughout this paper, we will be using the multi- index 
notation. A .D-dimensional multi-index a is a -D-tuple of non-negative integers and 
will be denoted using a bold lowercase Greek alphabet. For any D-dimensional multi- 
indices ck, (3 and any x G M D , 

\a\ = a[l] + ct[2] + ■ ■ ■ + a[D] 

a! = (a[l])!(a[2])!...(a[D])! 

x a = ( x [l]) a W(x[2]) a M . . . ( X [£)])«P] 

D a =df 1] df 2] ---d™ [D] 

a + /3 = (a[l]+/3[l],--- ,a[D]+(3[D]) 

a -13= ( a [l] - /3[1], • • • , a[D] - (3[D\) for a > (3. 

where di is a i-th directional partial derivative. Define a > (3 if a[d] > (3[d], and 
a. > p for p G Z + U {0} if a[d] > p for 1 < < D (and similarly for a < p). 

• (Size of a Point Set). Given a set S, it size is denoted by \S\. 

• (Probability Guarantee). We use the unbold Greek alphabet a. 

• (A Tree Node). A tree node represents a subset of a point set represented by the 
root node. Hence, we use the same notation as the previous. 

• (Representative Point of a Tree Node). Usually a geometric center is used but 
any point inside the bounding primitive of a tree node is chosen as well. For the tree 
node P, this is denoted as Cp. 
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(Child Nodes of an Internal Tree Node). Given a node N, denote its left and 
right child nodes by N L and N R respectively. 



2. Related Work 



§(x) - $(x) 


<e\ 







2. 1 . Error Bounds 

Due to its expensive computational cost, many algorithms approximate sums at the 
expense of reduced precision. The following error bounding criteria are used in the literature: 

Definition 2.1. r absolute error bound: For each for x G X , it computes $(a;) 

such that - < r. 

Definition 2.2. e relative error bound: For each $(a;) for x G X , compute such 
that 

Bounding the relative error is much harder because the error bound criterion is in terms 
of the initially unknown exact quantity. As a result, many previous methods 4|, |6| have 
focused on bounding the absolute error. The relative error bound criterion is preferred 
to the absolute error bound criterion in statistical applications in which high accuracy is 
desired. Our framework can enforce the following error form: 

Definition 2.3. (1 — a) probabilistic e relative/r absolute error: For each for 
x G X , compute such that with at least probability < 1 — a < 1, — $(a;) < 

e \${x)\ + r. 

2.2. Series Expansion 

A series of papers first laid the foundations for efficiently computing sums of pairwise 
potentials such as Coulombic and Yukawa potentials (Jj, 3, 4\. The common approach in 
these papers is to derive analytical series expansions of the given potential function in ei- 
ther Cartesian or spherical coordinate systems. The series expansion is then truncated after 
taking a fixed number of terms. The associated error bounds are derived from summing the 
truncated terms in an appropriate infinite geometric sum or bounding the remainder term 
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Figure 2: fcd-tree of a two-dimensional point set. At each level, the bounding box is split in half along 
the widest dimension. The solid points denote the points owned by each node. At each leaf node, we can 
enumerate each point with its depth-first rank. The minimum depth-first rank (inclusive) and the maximum 
depth-first rank (exclusive) is shown for each node. 



using Taylor's theorem. A recent line of work on efficient computation of pairwise function 
has focused on developing numerical representations of the potential matrix [4>(x m , x n )]m,n=ii 
rather than relying on analytical expansion of the potential function. [7] and j§] use singular 
value decomposition and the QR decomposition to compute the compressed forms of the 



2, Q 



take the "pseudo-particle" 



potential function and the three translation operators, 
approach by placing equivalent artificial charges on the bounding surface of the actual par- 
ticles by solving appropriate integral equations. All of these works have been limited to 
pairwise potential functions, and the approach does not naturally suggest a generalization 
to n-body potentials with n > 2. To our knowledge, no research has been performed on 
the problem of evaluating multibody potentials using a method more sophisticated than the 
0(N n ) brute- force algorithm with an ad- hoc cut-off distance. 



11 



12] 



3. Generalized iV-body Framework 



We use a variant of kd-trees 13[ to form hierarchical groupings of points based on their 
locations using the recursive procedure shown in Algorithm [TJ Initially, the algorithm starts 
with P = X (the entire point set). We split a given set of points along the widest dimension 
of the bounding hyper-rectangle into two equal halves at the splitting coordinate. We 
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continue splitting until the number of points is below some user-defined threshold called the 
leaf threshold. If the number of points owned by a node exceeds the leaf threshold, then it 
is called an internal node. Otherwise it is called a leaf node. Assuming that each split on a 
level results in the equal number of points on the left subset and the right subset P L and P R 
respectively, the runtime cost is C*(|A| log \X\). We note that the cost of building a kd-tree 



is negligible compared to the actual multibody potential computation (see Section 



Figure [2J The general framework for computing Equation ([T]) is formalized in 
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6]). Sec 

m 



Algorithm 1 BuildKdTree(P) 



if |P| is above the leaf threshold then 

Find the widest dimension d of the bounding box of P. 
Choose an axis-aligned split s along d. 

Split P = P L U P R where P L = {x G P | x[d] < s} and P R = P\P L . 
BuildKdTree(P l ), BuildKdTree(P r ) 

Form far-field moments of P by translating far-field moments of P L and P R . 
else 

Form far-field moments of P. 
Initialize summary statistics of P. 



This approach consists of the following steps: 

1. Build a spatial tree (such as /cd-trees) for the set of particles X and build far- field 
moments on each node of the tree (Bottom-up phase). 

2. Perform a multi-tree traversal over n-tuples of nodes (Approximation phase). 

3. Pre-order traverse the tree and propagate unincorporated bound changes downward 
(Top-down phase). 

Step 2 utilizes the procedure shown in Algorithm [2] (called by setting each Pj = X for 
1 < i < n), a recursive function that allows us to consider the n-tuples formed by choosing 
each Xi from Pf, we can gain efficiency over the naive enumeration of the n-tuples by using 
the bounding box and the moment information stored in each Pj. One such information is 
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P t Pj Pi Pj 

Figure 3: The lower and upper bound on pairwise distances between the points contained in a pair of nodes. 

Algorithm 2 MTPOTENTIALCANONICAL({Pj}™ =1 ) 

if CANSUMMARIZE({Pj}™ =1 ) (Try approximation.) then 

SuMMARlZE({P i } J ?l =1 , e, r, a) 

else 

if all of Si are leaves then 

MTPoTENTlALBASE({Pj}™ =1 ) (Base case.) 
else 

Find an internal node P k to split among {Pj}™ =1 . 
Propagate bounds of Pt to P k L and P^. 
MTPotentialCanonical({Pi, • ■ ■ , P k ^, P k L , P k+1 , • • ■ , P n }) 
MTPotentialCanonical({P 1; ■ ■ ■ , P k _ x , P k R , P k+1 , ■■■ , P n }) 
Refine summary statistics based on the two recursive calls. 

the distance bound computed using the bounding box (see Figure [3]). CanSummarize 
function first eliminates redundant recursive calls for the list of node tuples that satisfy 
the following condition: if there exists a pair of nodes Pj and Pj (i < j) among the node 
list Pi, ■ • • ,P n , such that the maximum depth-first rank of Pj is less than the minimum 
depth-first rank of Pj. In this case, the function returns true. See Figure [2] and |l7j . In 
addition, if any one of the nodes in the list includes one of the other nodes (i.e. there exists 
nodes Pj and Pj such that the minimum depth-first rank of Pj < the minimum depth-first 
rank of Pj < the maximum depth- first rank of Pj < the maximum depth- first rank of Pj), 
CanSummarize returns false. We do this because it is a bit tricky to count the number of 
tuples for each point in this case (see Figure H]). 



d'iPi.Pj) 
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(a) 



(b) 



(c) 



(d) 



Figure 4: For n = 3, four canonical cases of the three "valid" (i.e. the particle indices in each node are in 
increasing depth-first order) node tuples encountered during the algorithm: (a) All three nodes are equal; 
(b) Si and S2 are equal, and S3 comes later in the depth-first order; (c) S2 and S3 are equal and come later 
in the depth-first order; (d) All three nodes are different. 

Otherwise, CanSummarize function tests whether each potential sum for x G |J P m 

l<m<n 

can be approximated within the error tolerance determined by the algorithm. For example, 
if n — 4, we test for each xi G Pi, x 2 G P2, x 3 G P3, x 4 G P4, the following exact quantities 
can be approximated: 



$(xi;P 2 x P 3 x P 4 ) = 



E 



Xi 2 £P2\{xi} Xi 3 &P3\{xi} Xi A &Pi\{xi} 
«2<«3 «3<*4 



$(x 2 ;-Pi x ft x P 4 ) 



E E 



^ ] 4 > {. x 2) X i 1 ,Xi 3 ,Xi i) 



Xi l ePi\{x 2 } a; i3 eP3\{^2} a;i 4 6-P4\{a;2} 
H<«3 «3<*4 
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$(x 3 ;Pi x P 2 x P 4 ) 
$Or 4 ;Pi x P 2 x P 3 ) 



E E 



Xi 1 &Pi\{x 3 } x i2 &P 2 \{x 3 } ^i 4 &Pi\{x3} 
«1<«2 *2<«4 



Si 1 GPi\{x4} ^ijS^V^ 4 } ^GPsU^} 

i\<%2 «2<«3 



If the approximation is not possible, then the algorithm continues to consider the data 
at a finer granularity; it chooses an internal node P/~ (typically the one with the largest 
diameter) to split among {Pj}" =1 . Before recursing to two sub-calls in Line 9 and Line 10 
of Algorithm EJ the algorithm can optionally push quantities from a node that is being split 
to its child nodes (Line 8). After returning from the recursive calls, the node that was just 



split can refine summary statistics based on the results accumu 
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atec. 
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on its child nodes. The 
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18|. 



details of these operations are available in earlier papers 

The basic idea is to terminate the recursion as soon as possible, i.e. by considering a 
tuple of large subsets and avoiding the number of exhaustive leaf-leaf-leaf computations. 
We note that the CanSummarize and Summarize functions effectively replace unwieldy 
interaction lists used in FMM algorithms. Interaction lists in n-tuple interaction, if naively 
enumerated, can be large depending on the potential function <ft and the dimensionality D 
of the problem, whereas the generalized iV-body approach can handle a wide spectrum of 
problems without this drawback. 

3.1. Algorithm for Pairwise Potentials (n = 2) 

The general algorithmic strategy for pairwise potentials 0(-, •) is described in 
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161 ] . and consists of the following three main phases (see Figure [5]). Suppose we are given a 
set of "source" points (denoted as reference points) and a set of "target" points (denoted as 
query points^. 

1. Bottom-up phase: Compute far-field moments of order p in every leaf node of the 
reference tree. The resulting far-field expansion of each reference node P 2 is given by: 



$(x;P 2 ) = ^ 

ct>0 



( — 1)" 



Xi 2 eP2 



D a 4>{x -cp 2 ) = Y j M a (P 2 ,c P2 )D c 



ix - Cp 2 



a>0 



2 The terms "reference/query" have been used in the general framework we are applying 
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Figure 5: The reference points (the left tree) are hierarchically compressed and uncompressed when a pair 
of query (from the right tree) /reference nodes is approximated within an error tolerance. 



P 2 ) reads as "the potential sum on x due to the contribution of P 2 " and M a (P 2 , cp 2 ) 
as "the a-th far-field coefficient of P2 centered at cp 2 ." Because it is impossible to store 
an infinite number of far-field moments M a (P 2 , cp 2 ), we truncate the Taylor expansion 
up to the order p (determined either arbitrarily or by an appropriate error criterion): 



<S>(x;P 2 ;F(cp 2 ,p))= £ M a (P 2 , cp 2 )D a <t>{x - cp 2 ) 

\a\<p 



(4) 



such that 



$(x;P 2 ) -$(x;P 2 



is sufficiently small. $(x; P 2 ; F(cp 2 ,p)) reads as "the 
approximated potential sum on x due to the points owned by P 2 using up to the p-th 
order far-field expansion of P 2 centered at cp 2 ." 

For internal reference nodes, perform the far-to-far (F2F) translation to convert the 
far-field moments owned by the child nodes to form the far-field moments for their 
common parent node P 2 . For example, the far-field moments of P 2 centered at c p l is 
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shifted to cp 2 by: 

$(x;P 2 L ;F(c P2 ,p)) = E M 7 (P 2 L , Cft )(-l)T£P>(z - cp 2 ) (5) 

where 

M^P 2 \c P2 ) = £ ^— ^ (6) 

Note that there is no error incurred in each F2F translation, i.e. P 2 ; F(c P L,p)) = 
P 2 L ; F(c P2 ,p)) for any query point y from the intersection of the domains of x 
for P 2 L ; F(c P L,p)) and $(#; P 2 L ; F(cp 2 ,p)); the domain for which the far-field ex- 
pansion remains valid depends on the error bound criterion for each potential. The 
far-field moments of the parent node P 2 is the sum of the translated moments of its 

_ M a (P 2 L ,c pL )(c pL -cp 2 )-<-<* M ol (P 2 R ,c pR ){c pR -cp 2 )-<-<* 

child nodes: M^(P 2 ,c P2 ) = £ + 

Q!<7 

2. Approximation phase: For a given pair of the query and the reference nodes, 
determine the order of approximation and either (1) translate the far-field moments 
of the reference node to the local moments of the query node (2) or recurse to their 
subsets, if the F2L translation is more costly than the direct exhaustive method. 
Let us re- write the exact contribution of P 2 to a point x E P\. 

$(x;P 2 ) = E^?E M a (P 2 ,c P2 )D a+ ^(c Pl - c P2 )(x - c Pl f 



£ 

/3>0 



E ^D^(c Pl -x t2 ) 



xi 2 eP2 



(x - c Pl f = E MP2, c Pl )(x - c Pl f (7) 
/3>0 



where Np(P 2 , cpj reads as "the exact local moments □ contributed by the points in P 2 
centered at cp x " Truncating Equation (j7j) at |/3| < p' for some p' < p yields a direct 
local accumulation of order p. 

From the bottom- up phase, we know that \a\ < p. Similarly, we can store only a finite 
number of local moments up to the order p' < p and thus \f3\ < p' . We get the local 



3 We use N to denote the local moments because a "near-field" expansion is another widely used term 
for a local expansion. It avoids the potential notational confusion in the later parts of the paper. 
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expansion for Pi formed due to translated far-field moments of P 2 : 



<S>(x;P 2 ;N(c Pl ,p'))= E 
|/3|<p' 



£ M a (P 2 , C p 2 )^ + ^ 1)2 (c Pl - C p 2 ) 

H<p' 



(x - C Pl 



^/3 



|/3|<P' 



(8) 



where Np(P 2 ,cp 1 ) reads as "approximation to the exact local moments N^{P 2 ^cp 1 ) n 
and $(:r; P 2 \ N(cp 11 p')) as "the approximated potential sum on x due to the points 
in P 2 using up to the p-th order inexact local moments centered at Cp" . The F2L 
translation is applied only if 



^x;P 2 ;N(c Pl , P '))-^x;P 2 ) 
3. Top-down phase: Propagate the local moments of each 

quantities) to its child nodes using the local-to-local (L2L) operator. Suppose we have 
the following local expansion for x e Pi. 



is sufficiently small, 
query node (i.e. pruned 



*(x;F2L(P 1 ) U DL(Pi)-N(c Pl , PPl )) = E N a (F2L{P{) U DL(Pi), cp 1 )(xi 1 - c Pl ) a 

\<*\<p\ 

where p Pi is the maximum approximation order among (1) the F2L translations per- 
formed for Pi and all of the ancestor nodes of Pi (denoted by F2L(Pi)); and (2) the 
direct local accumulations of Pi and those passed down from all of the ancestors of Pi 
(denoted by DL(Pi)). Shifting the expansion to another center c* Pl G Pi is given by: 



$(x; F2L(Pi) U DL(Pi); N{c Pl ,p Pl )) 



(9) 



E (^)N^F2L(Pi)UDL(Pi),cp 1 )(c Pl -cp 1 f- 

/3><x V J 



\<*\<V\ 

= E N a (F2L(Pi)UDL(Pi),c Pl )(x-c Pl ] 



(x - c% ) 



(10) 



This shifted moments are added to the local moments of each child of Pi, in effect 
transmitting the pruned contributions downward. At each query leaf, we evaluate the 
resulting local expansion at each query point. 
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Pi 



P 3 



Figure 6: A far- field expansion at created by the moments of P2 and P3. Note the double-arrow between 
the nodes P2 and P3 corresponding to the basis functions D a ~ ax < 2 ~ ai 3 02, 3(^2 — P§) ( see Equation ITU)) . 

3.2. Far-field Expansion for Three-body Potentials (n = 3) 

In this section, we define far-field expansions for a three-body potential that is a product 
of functions of pairwise distances (see Equation ([[])): 



We define the far-field moments of a node the same way defined for the pairwise potential 
case. Suppose we are given three nodes P\ 7^ P2 7^ P3 from the tree. The following (n — 1)- 
nested sum expresses the contribution for x G Pi due to the other nodes P2 and P3: 



The basic goal here is to decompose Equation (|12p into sums of products of the far-field 
moments of each node. A far-field expansion for x^ G Pi induced by the far-field moments 



4>{x ix , X i2 ,Xi 3 ) = 0l j2 (Xij ,X i2 ) ■ 0l i3 (x h , X i3 ) • 02,3 {Xi 2 , Zj 3 ) 



(11) 




(12) 
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of P 2 and P3 is given by (see Figure | 
$(x;P 2 xP 3 ) 



= E E E [Xi \ c ^ 2 (-^d^ Mx -c P2 ) 

xi 2 eP2 xi 3 eP3 ai,2>o 

T {Xts ~ CP3 . )ai '\ -ir^D^Mx-c P3 ) 

«1,3>0 M 

( Xh c )g ,3 ( Cpi)OM ^, ^ 

^ 2 .3 ! ("2,3-/32,3)! 

<*2,3>0 /32,3<<*2,3 

By setting a = at 1)2 + 0=1,3 + 0:2,3 and pushing the summations over Xj 2 G P 2 and x i3 G P3 
inside, we get: 



♦m««)=E EE E ( ai Cf' s ) 



0:i,2 + /^2^ /^O - «1,2 - ^2,3 
«1,3 



a >" a i,2<aai,3<a-ai,2 /32,3<<*— 0:1,2—0:1,3 
M ai,2+/3 2 , 3 ( P 2, Cp 2 )M Q 

-ai, 2-/32, 3 (p 3 ,cp 3 )(-ir' 3 

D^i^K - cp 2 ) J D Ql >^ 1 , 3 (x il - cp3) J D a - ai - 2 - ai '^ 2 , 3 (cp 2 - c P3 ) (13) 



Truncating o at p-th order yields: 

<J>(x;P 2 x P 3 ;F(cp 2 x c Ps ,p)) 

= E E E E 

|ct|<p c*i,2<a «i,3<o:— cti,2 /32,3<o — cti, 2 — 021,3 



«1,2 + #2,3^ A* ~ Ol,2 - ^2,3 
Ol,2 / V Q l,3 



M ai,2+/32, 3 ( P 2, Cp 2 )M a 

-"1,2-/32,3 (p 3 ,cp 3 )(-ir' 3 

D a ^<t> 1)2 (x h - cp^D^fa^Xi, - cp^-^-^fo^cp, - cp,) (14) 

where P 2 x P 3 ; P(cp 2 x cp 3 ,p)) reads as "the p-th order far-field expansion at x due to 
the moments of P 2 centered at cp 2 and the moments of P3 centered at cp 3 ." 
Computational Cost of Evaluating the Far-field Expansion. The first three sum- 
mations over o, 0:1,2, cti,3 collectively contribute 0(p 3 ) terms, and the inner summation 
contributing at most 0(p 3 ) terms. Thus, evaluating the p-th order far-field expansion for a 
three-body potential on a single point takes O (p 6 ) time. 
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3.3. Far-field Expansion for General Multibody Potentials (n > 2) 

For a general multibody potential that can be expressed as products of pairwise functions 
(see Equation (j^J)), the far-field expansion induced by the points in P 2 , • ■ ■ , P n for x G Pi is: 

<5>(x-P 2 x ••■ x P n ) 



II E E — ^f^ i-ir^D^^-cp,) 



n e e 

2<s<i<n a 3! t>0 /3 Sl t<c*s 



[Xi s -cpj 133 '* {x it -c Pt 



\a S: t-(3 s 



(a Sj t - /3 Sit ) 



Focus on grouping and multiplying monomial powers of (xi k — cp k ) for each 2 < k < n 



a l,k + E ( a u,k-/3u,k)+ E /3fc,u 

(x ifc -c Pfc ) 11=2 



fc-l 



n -/3n,fe)! n Pk, 



u=2 



v=k+l 



k— 1 ra 

Let £ fc = ai )fe + ^ (a U)fe - (3 U}k ) + Pk,v and b k = — . 

«= 2 v=fe+l ai >fc ! n (a«,fc-/9«,*)l 11 Pk,v [ - 



Then, 



v=k+l 



$(x;P 2 x ••• xP n ) 

= II II E E E ^M^(p fc , CP j (-1)^* rr^Mx-cpjD^Mcp.-cn 

2<s<t<n 2<k<n a ljfc >0 a s ,t>0 /3 s ,t<« Sj t 

(15) 

Equation (j!5p is a convolution of far- field moments of P 2 , ■ ' ' ; Pn- We can truncate the ex- 



pansion above for terms for | ct \ 
includes the n = 2 and n = 3 cases 



l<r<s<n 



> p for some p > 0. Note that Equation ( 1X5]) 



$(x;P 2 x • • • x P n ;F(cp 2 x • • • x tp„,p)) 

= II n E E E E 

2<s<l<n2<Kn |oe[<p c*i,fc>0 a a ,t>0 s ,t<a s , t 



D a ^(j>xAx - c Pk )D a ^^ t {c Ps - c Pt 



(16) 



Computational Cost of Evaluating the Far-field Expansion. The summations over 

a ryS for 1 < r < s < n collectively contribute 0(p 3 ) terms, and each inner summation 
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Pi 



Figure 7: A local expansion created inside the node P\ at x by directly accumulating each point in Pi and 
P3 (see Equation (|17p ). We are not aware of a technique to express an interaction between a particle in P2 
and a particle in P3 (marked by the ? symbol) for p > 0. 

over (3 Sjt contributing at most 0(p 3 ) terms. Thus, evaluating the p-th order far-field ex- 
pansion for a general multibody potential of the form Equation |3f on a single point takes 
O (p^ 2 j time. In practice, we are forced to use p = for n > 2 unless most 
(t>p, q {xi p , XjJ's in Equation ([3]) are constant functions. 

3.4- Local Expansion for Three-body Potentials (n = 3) 

Unlike the far-field expansion case, we are presented a fundamental difficulty. In order 
to derive a local expansion, we need to express the influence of each non-evaluation point 
on the evaluation point x at a center near x. However, breaking up the interaction among 
the non-evaluation points (i.e. a^.'s in the arguments of <f)(x, x^, ■ ■ ■ ,Xj„_ 1 )) without loss 
of information is hard. To see this: take a three-body potential expressible in products of 
pairwise functions (see Figure [7]). Expanding near cp 1 inside the node Pi yields an expansion 
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valid for x G P%: 



$(x;P 2 xP 3 ) 

E E E 

Xi 2 £ p 2 ®i 3 £ P 3 <*1,2>0 



0:1,2' 



-{x-cpX 1 - 2 E 



«1.3>0 



0:1,3! 



E 

Ct2,3>0 



D a2 ' 3 <fe,3 (cjj -a i3 ) 
0:2,3! 



(^2 - CPii 



"2,3 



Again, let a = ai )2 + 0:1,3 + 0=2,3- Switching the orders of summations results: 

D a ^(t> h 2(c Pl -x i2 ) 



d>(.;P 2 xP 3 )=E E E 

ck>0 ct\ 2^0c CK 1 3 < CK — r> 1 2 



E 



Ol,2! 



E 

eP3 



D«i.s 1>3 (c Pl - x t3 ) Q2 ^ 2) 3(c Pl - x 



»3 y 



ai,3! 



02,3! 



E 



Q>0 



■(Xia - C Pl ) Q2 
(x-C A ) ai ' a+ai - 8 
(x - c Pl ) ai - 2+ai ' 3 



(17) 



£ £ N a {P 2 ,c Pl ) N a {P 3 ,c Pl ) 

CK1,2<CK 0:1^3 <CK — CK1 5 2 

We need the exponent of (x — c Pl ) to match a to be able to define the local moments inside 
Pi. Unless a 2 .3 = (i.e. ignore the interaction between a particle in the second set and a 
particle in the third set), this is not possible. Since we encounter a similar problem in the 
general case, we will skip its discussion. 

4. Simpler Algorithm for General Multibody Potentials 

Instead of trying to derive the full-fledged tools for general multibody potentials, we 



focus on deriving something simpler. Let us focus on the n = 3 case, 
pairwise disjoint nodes: Pi, P 2 , P3 and a monotonically decreasing^ 
such as (p(x 1 ,x 2 ,x 3 ) = \\ xl _ X2 \\"i,2\\ xl _x 3 \p,3\\ X2 _ X3 \p,3 , 

Vxi G Pi,$(xi;P 2 x P 3 ) = \P 2 \\P 3 \<j){c Pl ,c P2 ,c P3 ) 
Vxj G P 2 ,$(a; i ;Pi x P 3 ) = \P x \\P 3 \4){c Pl , c P2 , c P3 ) 
Vxfc G P 3 ,$(x fc ;Pi x P 2 ) = |Pi||P 2 |0(cp 1 ,Cp 2 ,Cp 3 ) 



r or a given set of three 
three-body potentials 



4 "Monotonic" multibody potentials decrease in value if one of the Euclidean distance arguments is in- 
creased while the other two are held constant. 
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which can be obtained by setting p = in Equation (I14p . This means that we can get a 
cheaper approximation using the number of points owned by each node. Using the pairwise 
minimum and maximum node distances yields: 

<f>(d u (P 1 ,P 2 ),d u (P 1 ,P 3 ),d u (P 2 ,P 3 )) < 0( CFl ,cp 2 , C p 3 ) < ^(d l (P 1 ,P 2 ),d l (P 1 ,P 3 ),d l (P 2 ,P 3 )) 
It is straightforward to generalize this for the n > 2 case. 

Non-monotonic Potentials: For non-monotonic potentials such as the Lennard- Jones 
potential 4>(xi,x 2 ) = -fj — \, we can compute the critical points of and determine the 
intervals of monotonicity of <fi and consider how <fi behaves in the distance bound range 
between d l {P\, P 2 ) and d u (P\, P 2 ). We take a simpler approach that results in an algorithm 
that is easier to code; we break up the potential into two parts such that 4>(xi, x 2 , ■ • • , x n ) = 
(p + (xi,x 2 , ■ ■ ■ , x n ) — <fi~(xi, x 2 , ■ ■ ■ , x n ), and get a lower and upper bound (though a looser 
bound) on the contributions from the positive potential <p + and negative potential <p~. 

4-1- Specifying the Approximation Rules 

The overall algorithm which also subsumes the pairwise potential case (n = 2) was 
shown in Algorithm [2J We can now specify the CanSummarize function for the general 
multibody case. For guaranteeing r absolute error bound criterion (Definition 12. ip . the 
CanSummarize function returns true if: 

|0(d w (P 1; P 2 ),--- ,rf"(P n _ 1 ,P n ))-0(rf'(P 1 ,P 2 ),--- ,d l (P n . u P n ))\ < ^ 

where T root = (^Zl) (i- e - the total number of tuples in each slice in Figured]). Let us also 
define Tj to be the number of tuples containing a fixed particle in P 4 (see Figure H]). For 
example, for n = 3, the corresponding Summarize function would accumulate for each node: 
for Pi: \P 2 \\P 3 \4>(c Pl ,Cp 2 ,Cp 3 ), for P 2 : \Pi\ \P 3 \4>(c Pl , Cp 2 , Cp 3 ), and for P 3 : |Pi| \P 2 \4>(c Pl , Cp 2 , c Ps ) 
Hybrid Absolute/Relative Error Guarantee. The algorithm for guaranteeing the 
hybrid absolute/relative error bound (Definition 12. 2j) deterministically (a = 0) is not so 
much different from that for guaranteeing the absolute error bound. In each node P, we 
maintain the lower bound on the accumulated potentials for the particles in P (denoted as 
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c R 






cq :* 




Figure 8: Three-body multipole methods for p = in a nutshell. 
$'(P), a summary statistic stored in P). The function CanSummarize returns true if, 
|0(d"(Px, P 2 ), • • • , tf(P„_i, Pn)) - 0(^(Pl, A), • • • , d\P n - lt P n ))\ 



< 



e min (&(Pj + 5 l (P i; P x x • • • x P^ x P i+1 x • • • x P n )) + r 
i<j<?i 



;is) 



where each ^(P; P a x- • -xP_ixP i+1 x- • -xP n ) = f] |P;|0(d"(Pi, P 2 ), • • • , d u {P n „ h P n )) 
(which is computed just using the contribution of the other nodes on the i-th node) is added 
to the currently running lower bound on each node $ z (Pj) to reflect the most recently avail- 
able information on the lower bound. $'(Pj) can be incremented and tightened as the com- 
putation progresses, either in the base case or when the recursive sub-calls in Algorithm |2] 
are completed (Line 11). 

Monte Carlo-based Approximations. The error bounds provided by the bounding 
boxes (see Figure [3]) assume that all pairs of points selected between the two nodes are 
collapsed to two positions that achieve the minimum distance (and vice versa for the max- 
imum distance); therefore, these bounds are very pessimistic and loose. Here we introduce 
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Algorithm 3 CanSummarize({P}™ =1 ): the Monte Carlo-based approximation. 



if C ■ rnumu < min{Ti, T 2 , T 3 } then 
for each P l G {P}™ =1 do 
if i == 1 or Pi 7^ P_x then 
for Xj G p do 

if CANSuMMARlZEMCPoiNT(xj,i, {Pj}™ =1 ) == false then 
return false 
return true 
else 

return false 



a method for approximating the potential sums with a probabilistic bound satisfying Defi- 
nition 12.31 We can trade determinism for further gain in efficiency. We have an additional 
parameter a that controls the probability level at which the deviation between each ap- 



proximation and its corresponding exact values holds. This was introduced first in 19|, |20j 



21] to handle 



for probabilistic approximations of aggregate sums and later extended in 
per-particle quantities. The theorem that we rely on for probabilistic approximation is the 
following: 

Theorem 4.1. Central limit theorem: Let f\, f 2 , ■ ■ ■ , f m be independent, identically 

m 

distributed samples from the probability distribution F with variance a 2 , and fx — — ^ f s be 

m s=l 

the sample mean of the samples. As m — )■ oo ; £t N(fi, a 2 jra). 

A widely accepted statistical rule of thumb asserts that 30 or more samples are usually 
enough to put a sample mean into the asymptotic regime. Berry-Esseen theorem character- 
izes the rate at which this convergence to normality takes place more precisely: 

Theorem 4.2. Berry-Esseen theorem: Let fx be the sample mean of m samples drawn 
from the distribution F , and let ll, a 2 , and p be the mean, variance, and third central moment 
of F. Let F m (x) be the cumulative distribution function of fx, and ty(x; fx, a 2 ) be the cdf of 
the Gaussian with mean fx and variance a 2 . Then there exists a positive constant C such 
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Algorithm 4 CanSummarizeMCPoint(z, i, {Si}™ =1 ): Pruning function for the Monte 

Carlo based approximation per each point. 
F x <- 

repeat 

Get a random n-tuple (x±, ■ • ■ , Xi-i, x, Xi + i, • • • , x n ) where Xj G Sj 

F x <- F x U • • • , x, • • • , x n )} 

until (z a / 2 <Tp FX < ^ and |F X | > 30) or \F X \ > m Umit 
return z a / 2 a^ FX < ^ 



that for all values of /i and m: 



\F m (fi)-^(fi;fi,a 2 )\< l/ ' 



which roughly says that the discrepancy between the normal distribution and the sample 
mean distribution goes down as For three-body potentials, suppose we are given the 
set of three nodes, Pi, P 2 , and P 3 . Let us consider x G Pi (similar approximations can be 
made for each point in P 2 and P 3 ), and the contribution of P 2 and P 3 to its potential sum: 

$(x;P 2 xP 3 ) = <P(x,x i2 ,x i3 ) 
x i2 ex\{x} Xi 3 ex\{x} 

«2<*3 

We can sample m potential values Xj 2 , Xj 3 ) from the empirical distribution F formed 

by the 3-tuples formed among Si, S2, and S3 that contain x in the list. From the m samples, 
we get the empirical distribution F^, from which we form an approximate P 2 x P 3 ): 

$(x;P 2 x P 3 ;F^) = Tijip-H = —^tixi^x^Xi') 

m s=l 

where Xq = x for all 1 < s < m. For sufficiently large values of m, we can assume that 
the discrepancy provided by the Berry-Esseen theorem is small and concentrate on the 
sample variance of the sample mean distribution. The sample variance of the sample mean 
distribution 07^^ is given by: 



^ 1 1 m 

a u ,. = — ;=■ = — =. > (©(Xj? ,Xis,Xi%) — Upij Y 
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Algorithm 5 SuMMARlZEMC({5 , i }™ =1 , {Tj}™ =1 , 0)\ Monte Carlo based approximation, 
for each $ G {^i}™ =1 do 

if i == 1 or 5, 7^ S'j-i then 

for G ^ do 

<- + T< • ftp.,, <- &(xi) + Ti ■ (jl F *i - zp^a^i) 

where a is the sample variance. Given m i.i.d. samples, with probability of at least (1 — a), 

P 2 x P 3 ; ) - P 2 x F 3 ) < I^/^ 

where z a /2 is the number of standard deviations on either side of Ji F H to give at least (1 — a) 
coverage under the normal distribution. 

Modifications to the algorithm. A Monte Carlo sampling based routine is shown in 
Algorithm [3J The function CanSummarize determines whether performing Monte Carlo 

n 

approximations (which involves iterating over each unique point x G |J Pi) with at least 

i=l 

™>Umit samples is computationally cheaper than the brute-force computation. ( is a global 
variable that dictates the desired amount of speedup needed for applying Monte Carlo 
approximations, rather than recursing to smaller subsets of the three nodes. If a desired 

n 

speedup could be achieved, it loops for each unique point in x G [j Pi and computes the 

i=i 

sample mean of the potential values of the tuples that contain x, and the corresponding 
variance of the sample mean until (1) the desired error is achieved; or (2) exceeds the 
number of trial samples mumit- Algorithm |3] is the form used for bounding the absolute 
error of each potential sum error by r with at least probability of (1 — a). For bounding 
the hybrid absolute/relative error with at least probability of (1 — a) (Definition I2.3[) . we 
replace the termination condition in the loop: z a /2a^ FX < with: 

e($'(Pi) + 2} (for*, - z a/2 a^ FXi ))+rTi 

±% ■ z a /2<? f M F x l s Tj^i uyj 



5. Correctness of the Algorithm 

The correctness of our algorithm for the deterministic hybrid absolute/relative error 
criterion is given by: 
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Theorem 5.1. Algorithm^ with the function CanSummarize with the relative error bound 
guarantee (Equation lT8\) produces approximation for G X such that 



|$0;J - ${x h )\ < e${x h )+T 



(20) 



Proof. (By mathematical induction) For simplicity, let us focus on n = 3. We induct on the 
number of points \P\ U P 2 U P 3 | encountered during the recursion of the algorithm. 
Base case: There are two parts to this part of the proof. 

• Line 1 of the function MTPotentialCanonical in Algorithm [2j any set of nodes 
Pi, P2, P 3 for which the function CanSummarize returns true satisfies the error 
bounds for x iu G S u for u = 1,2, 3: 



Vx tl G Pi, $(x h ;P 2 x P 3 ) - <S>( Xil ;P 2 x P 3 



< Txi u xP 2 xP 3 ( e $lfp\ T \ , - T x iu xi^Ar, 



\/x i2 G P 2 , 



J 1 roof 



$(x, 2 ;Pi xP 3 )-$(x J2 ;Pi xP 3 ) 



^(6$K)+r) 



Vxj 3 G P 3 , 



$(i !3 ;PixP 2 )-$(x !3 ;PixF 2 ) 



< 



P. 



e $'(P 3 )+T < 



P. 



liu X P 2 X P3 



(e$(x i3 ) + r) 



(21) 



'J'root \ J — J^root 

where T XiuX p 2X p 3 denotes the number of tuples chosen by fixing Xi u and selecting the 
other two from P 2 and P 3 and so on. 

• The function call MTPOTENTIALBASE in Algorithm |2j each x h G Pi and x i2 G P 2 
and Xi z G P 3 exchange contributions exactly and incur no approximation error. 

Inductive step: Suppose we are given the set of three nodes Pi, P 2 , and P 3 (at least 
one of which is an internal node) in the function MTPotentialCanonical. Suppose the 
three tuples Pi, P 2 , P 3 could not be pruned, and that we need to recurse on each child of 
Pi, P 2 , and P 3 . 

By assumption, CanSummarize returns false if any one of the nodes Pi, P 2 , P 3 includes 

one of the other nodes (see Section[3]). For n = 3, we can assume that the possible node tuple 
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cases that could be considered for pruning are shown in FigureHl Let {{P^}l =1 Y k=1 be the set 
of set of three nodes considered during the recursive sub-computations using the child nodes 
of each Pi, P 2 , and P3; note that the maximum value of t is 8 for three-body interactions. 
Note that for each k, Pj? is either (1) the node P s itself (2) the left child node of P s (3) the 
right child node of P s . Therefore, for each k = 1, 2, - • • , t, \P? U P 2 fe U P 3 fc | < \P X U P 2 U P 3 |. 
The equality holds when all of Pi, P2, and P3 are leaf nodes for which the error criterion is 
satisfied by the base case function (no error incurred). 

If any one of Pi, P2, and P3 is an internal node, then we are guaranteed that |P* U P 2 fc U 
P3 I < I Pi U P2 U P3I for all k — 1, • • • , t. We invoke the inductive hypothesis to conclude 
that for each k and for each Xi u G P^ for u = 1, 2, 3: 



Vx h G Pi fc , $0» i; P 2 fc x P*) - $( Xil ;P« x R 



k r>k\ if./ ^ . r>k .. r>k^ 



< Xn ^!: P " +r) < &M + r) 

$(x J2 ;P 1 fe xP|)-^(^ 2 ;P"xP 3 fc ) 



Vxj 2 G -P 2 , 



'J^root y \ z J J — J^root 

Vx 43 G Pt \Hx l3 ;P? x Pi) - <D(x i3 ; P} x P 2 fc ) 

< ^f^ +r) < ^ff^ (^) + r) 

where T s fc is the number of 3-tuples formed among P x , P 2 , P 3 fc that contain a fixed point in 
P s . By the triangle inequality, Equation |2D] holds by extending to Pi = P 2 = P3 = X since 
the number of encountered tuples for each particle add up to T root . □ 

We are now ready to prove the correctness of our algorithm for bounding the relative 
error probabilistically. 

Theorem 5.2. Algorithm IE with the function CanSummarize with the modification de- 
scribed in Equation fTPI produces approximations for Xi G X such that 

\^{Xi)-^{Xi)\ <€$(Xi)+T (22) 

with the probability of at least 1 — a for < a < 1, as the number of samples in the Monte 
Carlo approximation tends to infinity. 
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Proof. We extend the proof in Theorem 15.11 For simplicity, we again focus on the n = 3 
case. 

Base case: Given the set of three nodes with the desired failure probability a, the base 
case MTPotentialBase is easily shown to satisfy Equation EH with 100 % probability ( 
> 1 — a). Similarly, each Monte Carlo prune satisfies Equation |2T] with probability of 1 — a 
asymptotically. 

Inductive case: For a non-prunable set of three nodes {Pfc}f =1 for the required failure 
probability /?. Note that MTPotentialCanonical results in a maximum of four (i.e. 
2 3_1 = 4) sub-calls for a set of non-prunable Pi, P 2 , P3 nodes. For example, suppose Pi is 
an internal node, and consider its left child, P X L . The contribution of P2 and P3 on Pf can be 
computed by considering the node combinations: (Pf, P 2 L , P 3 L ), {P[, P 2 L , P R ), (P[, P R , P 3 ), 



{Pi, P 2 R , P^), resulting in a maximum of four combinations if Pi, P2, P3 satisfy the case 4(a" 



in Figure HI Each recursive sub-call is equivalent to a stratum in a stratified sampling, and 
satisfies the following: 
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Collectively, the results from these strata add up to potential estimates that satisfy the 
error bound with at least 1 — a probability for each Xi u G P^ and the following holds: 



${x lu ;P 2 xP 3 )-${x lu ;P 2 xP 3 



< 



cT XiuX p 2X p 3 ^ l TT XiuX p 2X p 3 

iproot %u ' iproot 



where T XiuX p 2XPa - T x ^ xP l xP l + T x ^ xP l xP r + T x ^ xP r xP l + T x ^ xP r xP r. The similar 
bounds hold for each x G P R , and the same reasoning can be extended to the bounds for P 2 
and P3. Because $ z (Pi) = min{$ / (P 1 L ), $ (P^ )} throughout the execution of the algorithm, 
we can extend the argument to the case where Pi = P 2 = P3 = X . □ 
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Tree building vs multibody computation on points distributed in an annulus 

Elapsed time in seconds 



Number of points 



1IXX1 2000 3000 4000 5000 601X1 7000 8000 9000 10O00 



Tree building jn a: = 0.5,e = 0.1 



Figure 9: Building the kd-tiee takes negligible amount of time compared to the time it takes for the actual 
multibody computation. 

6. Experiment Results 



All of our algorithms were based on an open-source C++ library called MLPACK [221. 123 1. 
The experiments were performed on a desktop with AMD Phenom II X6 HOOT Processors 
utilizing only one core with 8 GB of RAM. 

6.1. Tree Building 

The cost of tree-building is negligible compared to the actual multibody computation. 
Compared to complex, irregular memory access patterns encountered in the multibody com- 
putation (as do most recursive algorithms in general), the tree-building phase requires mostly 
sequential scanning of contiguous blocks of memory and thus requires shorter amount of time. 
See Figure [91 where the tree building is compared to the multibody computation with the 
relative error criterion e = 0.1 and the 50 % probability guarantee (a = 0.5). The annulus 
distribution was chosen deliberately to show that even under the distribution for which the 
multibody computation is relatively fast (see Section I6T21 . the tree building requires a tiny 

fraction of time compared to the computation time. 
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Figure 10: Speedup result on uniformly distributed points using the deterministic algorithm (a — 0). The 
base timings for the naive algorithm on each point set are: 1.91 x 10 1 seconds, 1.54 x 10 2 seconds, 5.17 x 10 2 
seconds, 1.23 x 10 3 seconds, 2.39 x 10 3 seconds, 4.16 x 10 3 seconds, 6.64 x 10 3 seconds, 9.76 x 10 3 seconds, 
1.43 x 10 4 seconds, and 1.92 x 10 4 seconds. 

6.2. Multibody Computation 

We demonstrate speedup results of our approximate algorithms guaranteeing the (1 — a) 
probabilistic e relative error criterion (Definition [2]3]). For this paper, we focus strictly on the 
relative error criterion (r = 0) and test on three relative error parameter values ( e = 0.001, 
e = 0.01, and e = 0.1). We test on three different types of distribution: uniform within 
the unit hypercube [0, l] 3 (denoted as the "uniform" distribution), the annulus distribution 
(denoted as the "annulus" distribution) in three dimensions, and uniform within the unit 
three-dimensional sphere (denoted as the "ball" distribution). These three distributions 
were also used in [24| ) . For the deterministic and probabilistic algorithms, the order of local 
expansion is fixed at p = and only 0-th order multipole expansions are used for the results. 
Deterministic Approximations. Figure [101 Figure [121 an d Figure [11] show speedup 
results against the naive algorithm using only the deterministic approximation (i.e. a = 0). 
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Figure 11: Speedup result on points distributed inside a sphere using the deterministic algorithm (a = 0). 
The base timings for the naive algorithms are listed in Figure 1101 

On the uniform distribution and the ball distribution, the speedup is almost non-existent; the 
speedup factor is a little bit more than two on the dataset containing 10, 000 points using 
the lowest parameter setting of e = 0.1. On the annulus distribution, our deterministic 
algorithm achieves a little bit better speedup against the naive algorithm; a factor of more 
than 20 times speedup on 10, 000 points is encountered on e = 0.1. A tree-based hierarchical 
method generally works better for clustered point sets, and this is reflected in our results. 
Monte-Carlo Approximations. In this section, we show whether adding indeterminism 
by sampling can reduce the computation time while guaranteeing a slightly relaxed error 
criterion (but with a high probability guarantee for each potential sum). We first relax 
the probability guarantee to be 90% (i.e. a = 0.1). Like the results shown using the 
deterministic algorithm, our Monte Carlo-based algorithm achieves the most speedup on 
points distributed in an annulus (1000 times speedup on 10,000 points using e = 0.1). See 
Figure [THJ Figure EJ and Figure [T5J 

We also list the percentage of the points actually achieving the e relative error bound 
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Speedup over naive on points distributed in an annulus (a=0) 




Figure 12: Speedup result on points distributed on an annulus using the deterministic algorithm ( 
The base timings for the naive algorithms are listed in Figure [TO] 




Figure 13: Speedup result on uniformly distributed points using the Monte Carlo-based algorithm (a 
The base timings for the naive algorithms are listed in Figure 1101 
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Figure 14: Speedup result on points distributed inside a sphere using the Monte Carlo-based algorithm 
(a = 0.1). The base timings for the naive algorithms are listed in Figure [TU1 




Figure 15: Speedup result on points distributed on an annulus using the Monte Carlo-based algorithm 
(a = 0.1). The base timings for the naive algorithms are listed in Figure [TU1 
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Table 1: The distribution of relative error on the uniform distribution using a = 0.1 and e = 0.001. 

along with the mean and the variance in Table [TJ Table HJ and Table [3J The relative error 
level of 0.001 and the probability guarantee of 90% was used. Under all three distributions, 
the percentage of points whose potential sum achieved the desired relative error of 0.001 was 
well above 90%. We list the average relative error, the variance, and the maximum relative 
error. Note that the maximum relative error can exceed 100% if the true potential sum and 
its approximation have opposite signs. For a particle with a small potential sum, we have 
observed that this is indeed the case due to numerical inaccuracies accumulated during the 
summation. 

7. Conclusion 

In this paper, we have introduced the framework for extending the pairwise series ex- 
pansion to potentials that involve more than two points. Through this process, we have 
formally defined an analogue to the far-field expansion for approximating the multibody 
potentials in a hierarchical fashion as done in traditional FMM algorithms and have derived 
algorithms for guaranteeing (1) absolute error bound (2) relative error bound (3) proba- 
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Table 2: The distribution of relative error on the ball distribution using a = 0.1 and e = 0.001. 
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Table 3: The distribution of relative error on the annulus distribution using a = 0.1 and e = 0.001. 
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bilistic absolute/relative error on each particle potential sum and proved the correctness of 
our algorithms formally. However, we do not present a full-fledged derivation of all three 
translation operators and the analogue to the local expansion due to a technical difficulty. 
Instead, we propose to use only a monopole approximation (p = 0) in a simpler alternative 
algorithm. Our experiment demonstrates that the algorithm using the hybrid determinis- 
tic/probabilistic approximation heuristic achieves speedup under points lying on an annulus 
of a sphere (i.e. lower-dimensional manifold). For our future work, we are working on 



parallelization as done in 
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